Presented by:
Orlando Morales - 30190412
Soma Dipti - 30185455
Swarnim Khosla - 30188716
Introduction
Economy is the area of study for our research. We want to learn how Canada and Alberta are faring because we have recently emerged from one of the most devastating global health crises. Everyone has been affected by the COVID-19 pandemic losses in some manner over the past two and a half years. GDP and CPI are two of the most crucial factors in determining a nation's and its citizens' level of economicĀ health. Oil and gas become a pertinent factor for our study of Canada and Alberta because it is one of the greatest contributors to Canada's GDP. Alberta produces the majority of Canada's oil, therefore research into the oil and gas business is done to better understand how the province's economy is affected and recovering.
The importance of this issue is to have a broad perspective on the nation's current state, as this will immediately assist a person in making wise decisions. Our datasets were split into four groups based on market factors that could provide insight into the state of the economy.
Four datasets are included in the first category, which is used to examine the Canadian economy and pinpoint any key variables influencing change. The first one includes the GDP and growth rate for each nation, while the other two include population data and globe map coordinates. We compute the GDP per capita using these numbers. The daily Oil Price Index across the years is provided by the fourth dataset. These figures serve as a comparison between the current rate of growth and that of the previous year. It willĀ also assist in determining whether there is any relationship, if any, between the daily global oil price index and GDP per capita.
We used a single dataset containing consumer price index across industries throughout the years to determine the rate at which the price values for products and services have changed over time and the impact of covid on it. This will help us analyse the impact of pandemic. The highest contributor's behaviour is observed, and this behaviour is compared to the recovery to draw conclusions.
We have examined the money produced by the industry that contributes the most in order to comprehend the economic recovery following the epidemic.Ā This was accomplished using seven datasets that included company-specific sales information from 2016 to 2021. These statistics were important because they let us to calculate annual gross and net revenue as well as royalties paid to the government in the form of taxes, which allowed us to predict the economic recovery.
The first two datasets in our last category are historical lending loan and prime rate of interest as determined by the government of Alberta. The next two had theĀ average weekly earnings by industry and the unemployment rate for young adults and those over the age of 15. With all these data set combined we can infer Alberta and Canada's overall economic health , thus helping one make informed financial data.
Guiding Questions:
How does Canadaās GDP was affected by COVID?
Which Albertaās sector was the most impacted during COVID?
How has inflation impacted different aspects of peopleās life?
Conclusion: General situation
These inquiries are crucial for tracking the pandemic's effects as well as the province's present economic state and Canada's general situation. These details will give insight into the financial situation and aid in decision-making.
Files available at:
https://drive.google.com/drive/folders/1Yj5482TAmhggvqYr6_NgouR2Vbdw4Pf-?usp=sharing
#Import all libraries to be used:
#data frames
import numpy as np
import pandas as pd
import geopandas as gpd
#datetime
import datetime as dt
import calendar
#statistical
from scipy.stats import pearsonr
#visualizations
import matplotlib as mpl
import matplotlib.pyplot as plt
from matplotlib import rcParams
import seaborn as sns
from plotly.subplots import make_subplots
import plotly.express as px
import plotly.graph_objects as go
# Display all cell outputs
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'
#Install openpyxl
import sys
!{sys.executable} -m pip install openpyxl
Requirement already satisfied: openpyxl in /opt/conda/lib/python3.10/site-packages (3.0.10) Requirement already satisfied: et-xmlfile in /opt/conda/lib/python3.10/site-packages (from openpyxl) (1.1.0)
Guiding question 1: How does Canadaās GDP was affected by COVID?
Data set used:
Libraries used for visualizations:
Data Cleaning:
Datasets including GDP data and population data and coordinates for the map's presentation were combined on the date column to provide a response to the first guiding question. For both data sets, data cleaning procedures comprised column renaming, column insertion and calculation, formatting numbers, and transposing data frames.
#Read data for GDP and Map
rdata_gdp_country = pd.read_csv("./gdp_csv.csv")
rdata_gdp_country2022 = pd.read_csv("./gdp1_2022.csv")
coordinates = pd.DataFrame(pd.read_html('https://developers.google.com/public-data/docs/canonical/countries_csv')[0])
rdata_gdp_country.head()
rdata_gdp_country2022.head()
| Country Name | Country Code | Year | Value | |
|---|---|---|---|---|
| 0 | Arab World | ARB | 1968 | 2.576068e+10 |
| 1 | Arab World | ARB | 1969 | 2.843420e+10 |
| 2 | Arab World | ARB | 1970 | 3.138550e+10 |
| 3 | Arab World | ARB | 1971 | 3.642691e+10 |
| 4 | Arab World | ARB | 1972 | 4.331606e+10 |
| Country Name | Country Code | Indicator Name | Indicator Code | 1960 | 1961 | 1962 | 1963 | 1964 | 1965 | ... | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Aruba | ABW | GDP (current US$) | NY.GDP.MKTP.CD | NaN | NaN | NaN | NaN | NaN | NaN | ... | 2.615084e+09 | 2.727933e+09 | 2.791061e+09 | 2.963128e+09 | 2.983799e+09 | 3.092179e+09 | 3.202235e+09 | 3.310056e+09 | 2.496648e+09 | NaN |
| 1 | Africa Eastern and Southern | AFE | GDP (current US$) | NY.GDP.MKTP.CD | 2.129059e+10 | 2.180847e+10 | 2.370702e+10 | 2.821004e+10 | 2.611879e+10 | 2.968217e+10 | ... | 9.730430e+11 | 9.839370e+11 | 1.003680e+12 | 9.242530e+11 | 8.823550e+11 | 1.020650e+12 | 9.910220e+11 | 9.975340e+11 | 9.216460e+11 | 1.082100e+12 |
| 2 | Afghanistan | AFG | GDP (current US$) | NY.GDP.MKTP.CD | 5.377778e+08 | 5.488889e+08 | 5.466667e+08 | 7.511112e+08 | 8.000000e+08 | 1.006667e+09 | ... | 1.990732e+10 | 2.014640e+10 | 2.049713e+10 | 1.913421e+10 | 1.811656e+10 | 1.875347e+10 | 1.805323e+10 | 1.879945e+10 | 2.011614e+10 | NaN |
| 3 | Africa Western and Central | AFW | GDP (current US$) | NY.GDP.MKTP.CD | 1.040414e+10 | 1.112789e+10 | 1.194319e+10 | 1.267633e+10 | 1.383837e+10 | 1.486223e+10 | ... | 7.275700e+11 | 8.207930e+11 | 8.649900e+11 | 7.607340e+11 | 6.905460e+11 | 6.837490e+11 | 7.416900e+11 | 7.945430e+11 | 7.844460e+11 | 8.358080e+11 |
| 4 | Angola | AGO | GDP (current US$) | NY.GDP.MKTP.CD | NaN | NaN | NaN | NaN | NaN | NaN | ... | 1.249980e+11 | 1.334020e+11 | 1.372440e+11 | 8.721929e+10 | 4.984049e+10 | 6.897276e+10 | 7.779294e+10 | 6.930910e+10 | 5.361907e+10 | 7.254699e+10 |
5 rows Ć 66 columns
#countries transformations
countries = rdata_gdp_country2022['Country Name'],rdata_gdp_country2022['2021'],rdata_gdp_country2022['Country Code']
countries = pd.DataFrame(countries).T
countries['2021'].astype('float')
countries.rename(columns={'Country Name':'name','Country Code':'iso_a3'},inplace=True)
for column in countries['2021']:
countries['2021'] = countries['2021'].fillna(0)
countries = pd.merge(coordinates, countries, how='outer', on='name')
#World Map transformation
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
world = pd.merge(world, countries, how='inner', on='iso_a3')
world['gdp_pc'] = world['2021']/world['pop_est']
0 NaN
1 1.082100e+12
2 NaN
3 8.358080e+11
4 7.254699e+10
...
261 9.007159e+09
262 2.106169e+10
263 4.199460e+11
264 2.120306e+10
265 2.621773e+10
Name: 2021, Length: 266, dtype: float64
ERROR 1: PROJ: proj_create_from_database: Open of /opt/conda/share/proj failed
#print data frames: Countries GDP and Coordinates
countries.head()
world.head()
| country | latitude | longitude | name | 2021 | iso_a3 | |
|---|---|---|---|---|---|---|
| 0 | AD | 42.546245 | 1.601554 | Andorra | 3.329911e+09 | AND |
| 1 | AE | 23.424076 | 53.847818 | United Arab Emirates | 0.000000e+00 | ARE |
| 2 | AF | 33.939110 | 67.709953 | Afghanistan | 0.000000e+00 | AFG |
| 3 | AG | 17.060816 | -61.796428 | Antigua and Barbuda | 1.471126e+09 | ATG |
| 4 | AI | 18.220554 | -63.068615 | Anguilla | NaN | NaN |
| pop_est | continent | name_x | iso_a3 | gdp_md_est | geometry | country | latitude | longitude | name_y | 2021 | gdp_pc | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 889953.0 | Oceania | Fiji | FJI | 5496 | MULTIPOLYGON (((180.00000 -16.06713, 180.00000... | FJ | -16.578193 | 179.414413 | Fiji | 4.592119e+09 | 5159.956436 |
| 1 | 58005463.0 | Africa | Tanzania | TZA | 63177 | POLYGON ((33.90371 -0.95000, 34.07262 -1.05982... | TZ | -6.369028 | 34.888822 | Tanzania | 6.777510e+10 | 1168.426184 |
| 2 | 37589262.0 | North America | Canada | CAN | 1736425 | MULTIPOLYGON (((-122.84000 49.00000, -122.9742... | CA | 56.130366 | -106.346771 | Canada | 1.990760e+12 | 52960.869516 |
| 3 | 328239523.0 | North America | United States of America | USA | 21433226 | MULTIPOLYGON (((-122.84000 49.00000, -120.0000... | US | 37.090240 | -95.712891 | United States | 2.299610e+13 | 70058.900250 |
| 4 | 18513930.0 | Asia | Kazakhstan | KAZ | 181665 | POLYGON ((87.35997 49.21498, 86.59878 48.54918... | KZ | 48.019573 | 66.923684 | Kazakhstan | 1.908140e+11 | 10306.509747 |
#plot parameters for World map
fig, ax = plt.subplots(1, 1)
orig_map=plt.cm.get_cmap('summer')
reversed_map = orig_map.reversed()
rcParams['figure.figsize'] = 12,8
wm = world.plot(column='gdp_pc',cmap = reversed_map, ax=ax, legend=True,
legend_kwds={'label': "GDP per capita by Country 2021",'orientation': "horizontal"})
ax.set(title='World Map \n GDP per Capita')
plt.show()
[Text(0.5, 1.0, 'World Map \n GDP per Capita')]
The GDP per capita is shown on a world map, and the greener it is, the higher the value. All countries are shown, however it is clear that Canada receives an average value of between $50,000 and $60,000 per person.
Data set comprising information about the entire world was utilized for below section, therefore data processing included renaming and cleaning up columns, removing, and filling in blanks and zeroes. For the visualizations, a filter in Country was applied to Canada.
#Data cleaning for GDP
#data set to work with
gdpset = rdata_gdp_country2022
#Eliminate blanks from column names
gdpset.rename(columns={
'Country Name':'Country_Name',
'Country Code':'Country_Code',
'Indicator Name':'Indicator_Name',
'Indicator Code':'Indicator_Code'},
inplace=True)
#columns to drop
gdpset_to_drop = ['Indicator_Code','Indicator_Name','Country_Code',
'1960','1961','1962','1963','1964','1965','1966','1967','1968','1969',
'1970','1971','1972','1973','1974','1975','1976','1977','1978','1979',
'1980','1981','1982','1983','1984','1985','1986','1987','1988','1989',
'1990','1991','1992','1993','1994','1995','1996','1997','1998','1999',
'2000','2001','2002','2003','2004','2005','2006','2007','2008','2009'
]
gdpset = gdpset.drop(gdpset_to_drop,axis=1)
#filling blanks and assiging NaN
for column in gdpset:
gdpset[column] = gdpset[column].fillna(np.NaN)
gdpset[column] = gdpset[column].replace(0,np.NaN)
#print data frame: GDP
gdpset.head()
| Country_Name | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Aruba | 2.453631e+09 | 2.637989e+09 | 2.615084e+09 | 2.727933e+09 | 2.791061e+09 | 2.963128e+09 | 2.983799e+09 | 3.092179e+09 | 3.202235e+09 | 3.310056e+09 | 2.496648e+09 | NaN |
| 1 | Africa Eastern and Southern | 8.604780e+11 | 9.644180e+11 | 9.730430e+11 | 9.839370e+11 | 1.003680e+12 | 9.242530e+11 | 8.823550e+11 | 1.020650e+12 | 9.910220e+11 | 9.975340e+11 | 9.216460e+11 | 1.082100e+12 |
| 2 | Afghanistan | 1.585668e+10 | 1.780511e+10 | 1.990732e+10 | 2.014640e+10 | 2.049713e+10 | 1.913421e+10 | 1.811656e+10 | 1.875347e+10 | 1.805323e+10 | 1.879945e+10 | 2.011614e+10 | NaN |
| 3 | Africa Western and Central | 5.915960e+11 | 6.709830e+11 | 7.275700e+11 | 8.207930e+11 | 8.649900e+11 | 7.607340e+11 | 6.905460e+11 | 6.837490e+11 | 7.416900e+11 | 7.945430e+11 | 7.844460e+11 | 8.358080e+11 |
| 4 | Angola | 8.169956e+10 | 1.094370e+11 | 1.249980e+11 | 1.334020e+11 | 1.372440e+11 | 8.721929e+10 | 4.984049e+10 | 6.897276e+10 | 7.779294e+10 | 6.930910e+10 | 5.361907e+10 | 7.254699e+10 |
#Plot GDP by time, Canada vs average world
#Filter by Canada
gdpcan = gdpset[(gdpset.Country_Name == 'Canada')]
#manipulating data to plot
gdpcan = gdpcan.melt(id_vars=['Country_Name'],var_name="Year")
gdpcan.rename(columns={'value':'GDP'},inplace=True)
#plot
gdpcan_plt = px.line(gdpcan, x='Year', y='GDP', height=500)
gdpcan_plt= gdpcan_plt.update_layout(title='Canada\'s Gross Domestic Product (GPD) by Year')
gdpcan_plt= gdpcan_plt.update_traces(line=dict(color='orange'))
gdpcan_plt= gdpcan_plt.add_vrect(x0="9", x1="11", annotation_text="COVID19", annotation_position="top right",
fillcolor="green", opacity=0.25, line_width=0)
gdpcan_plt= gdpcan_plt.add_vrect(x0="4", x1="6", annotation_text="Recession", annotation_position="top right",
fillcolor="green", opacity=0.25, line_width=0)
gdpcan_plt.show()
From the above time series showing GDP, two interesting periods are marked, the first one is the 2015 recession and the second one is the COVID-19 pandemic, both periods show a decrease in GDP but at the same time after those periods a recovery can be seen
The data set containing historical oil prices was read using skiprows and the date column was parsed for the following steps. For the visualisation,plotly express was used to oil prices.
#raw oil price
rdata_gdp_oilprice = pd.read_csv("./wti-crude-oil-prices-10-year-daily-chart.csv",skiprows=15,parse_dates=['date'])
#manipulation of date
dseries = rdata_gdp_oilprice['date'].map(lambda x: x.day)
mseries = rdata_gdp_oilprice['date'].map(lambda x: x.month)
yseries = rdata_gdp_oilprice['date'].map(lambda x: x.year)
rdata_gdp_oilprice.insert(1,'Day',dseries)
rdata_gdp_oilprice.insert(2,'Month',mseries)
rdata_gdp_oilprice.insert(3,'Year',yseries)
#print data frame: GDP and Oil Price by month
rdata_gdp_oilprice.head()
| date | Day | Month | Year | value | |
|---|---|---|---|---|---|
| 0 | 2012-10-10 | 10 | 10 | 2012 | 91.25 |
| 1 | 2012-10-11 | 11 | 10 | 2012 | 92.07 |
| 2 | 2012-10-12 | 12 | 10 | 2012 | 91.86 |
| 3 | 2012-10-15 | 15 | 10 | 2012 | 91.85 |
| 4 | 2012-10-16 | 16 | 10 | 2012 | 92.09 |
#subset oilprice
oil_price = rdata_gdp_oilprice
oil_price.rename(columns={' value':'Oil_price','date':'Date'},inplace=True)
oil_price.dropna(subset = ["Oil_price"], inplace=True)
#plot Oil price
oil_plt = px.line(oil_price, x='Date', y='Oil_price', height=500,title="Time Series for Oil Price")
oil_plt=oil_plt.update_layout(xaxis=dict(rangeslider=dict(visible=True)))
oil_plt.show()
Oil prices through time are shown in the chart above, as for the last chart, it can be seen that the same time periods (2015 recession and COVID-19 pandemic) show a drop in oil prices.
#manipulate variables and create a new data frame for correlation
gdpcor = gdpcan
gdpcor['Year'] = gdpcor['Year'].astype(int)
op_dropc = ['Date','Day','Month']
opcor = oil_price.drop(op_dropc, axis=1)
opcor = opcor.groupby('Year', as_index=False).mean()
gdpop = pd.merge(gdpcor,opcor,how='inner',on='Year')
gdpop_pearson = pearsonr(gdpop['GDP'],gdpop['Oil_price'])
#print data frame: GDP and Oil Price
gdpop.head()
| Country_Name | Year | GDP | Oil_price | |
|---|---|---|---|---|
| 0 | Canada | 2012 | 1.828370e+12 | 87.942140 |
| 1 | Canada | 2013 | 1.846600e+12 | 98.068722 |
| 2 | Canada | 2014 | 1.805750e+12 | 92.849937 |
| 3 | Canada | 2015 | 1.556510e+12 | 48.834810 |
| 4 | Canada | 2016 | 1.527990e+12 | 43.559508 |
In the below steps, the data set used was a little bit challenging as the date could not parsed directly. In the below steps date is formatted in date format and converted in date type, the columns were renamed and grouped by category.
A special aggregated by mean of the category was presented so that the visualization can be arranged in ascending order and the idea of the visualization is to see where the Energy field is placed for Alberta.
#read raw data
rdata_cpi = pd.DataFrame(pd.read_csv("./stc_18-10-0004-01_consumer_price_index_csv_v48.0_2022-09-20.csv"))
#date is imported as object and cannot change to data time in one step (not that we know)
new_date = rdata_cpi["Date"].str.split("-", n = 1, expand = True)
rdata_cpi.insert(1,'Month', new_date[0])
rdata_cpi.insert(2,'Year',new_date[1])
rdata_cpi['Month'] = pd.Series(rdata_cpi['Month'], dtype="string")
rdata_cpi['Year'] = pd.Series(rdata_cpi['Year'], dtype="string")
rdata_cpi = rdata_cpi.drop('Date',axis=1)
rdata_cpi.insert(0,'Date','01-' + new_date[0]+'-'+new_date[1])
rdata_cpi['Date']= pd.to_datetime(rdata_cpi['Date'])
#print data frame: Raw Data CPI
rdata_cpi.head()
| Date | Month | Year | GEO | Products_and_product_groups | Value | Percent_Change | |
|---|---|---|---|---|---|---|---|
| 0 | 2006-07-01 | Jul | 06 | Alberta | All-items | 113.4 | NaN |
| 1 | 2006-07-01 | Jul | 06 | Alberta | Food | 108.3 | NaN |
| 2 | 2006-07-01 | Jul | 06 | Alberta | Shelter | 126.1 | NaN |
| 3 | 2006-07-01 | Jul | 06 | Alberta | Household operations, furnishings and equipment | 102.6 | NaN |
| 4 | 2006-07-01 | Jul | 06 | Alberta | Clothing and footwear | 96.2 | NaN |
#Manipulate cpi
cpi_m = rdata_cpi[(rdata_cpi.Products_and_product_groups == 'Energy')]
cpi_m = rdata_cpi.groupby(['Year'],as_index=False).sum()
cpi_m['Year'] = pd.Series('20'+cpi_m['Year'])
cpi_m['Year'] = pd.Series(cpi_m['Year'], dtype='int')
cpi_m = cpi_m.loc[(cpi_m['Year'] > 2009) & (cpi_m['Year'] < 2022)]
cpi_m = cpi_m.drop('Percent_Change',axis=1)
cpi_m.rename(columns={'Value':'CPI_Energy'},inplace=True)
#Manipulate gdp
gdp_m = gdpset[(gdpset.Country_Name == 'Canada')]
gdp_m = gdp_m.melt(id_vars=['Country_Name'],var_name="Year")
gdp_m = gdp_m.drop('Country_Name',axis=1)
gdp_m['Year'] = pd.Series(gdp_m['Year'], dtype='int')
gdp_m.rename(columns={'value':'GDP'},inplace=True)
#Merge both dataframes
gdp_cpi = pd.merge(gdp_m, cpi_m, how='inner', on='Year')
#print data frame: GDP and CPI
gdp_cpi.head()
| Year | GDP | CPI_Energy | |
|---|---|---|---|
| 0 | 2010 | 1.617340e+12 | 28677.4 |
| 1 | 2011 | 1.793330e+12 | 29724.1 |
| 2 | 2012 | 1.828370e+12 | 30066.6 |
| 3 | 2013 | 1.846600e+12 | 30396.4 |
| 4 | 2014 | 1.805750e+12 | 31108.2 |
#Manipulate data
cpiv = rdata_cpi[(rdata_cpi.GEO == 'Alberta')]
cpiv = cpiv.drop('Date',axis=1)
cpiv = cpiv.drop('Month',axis=1)
cpiv.rename(columns={'Value':'CPI'},inplace=True)
#replace values in column Products and product groups
cpiv["Products_and_product_groups"] = cpiv["Products_and_product_groups"].map(
{"Household operations, furnishings and equipment":"Housing",
"Recreation, education and reading":"Recreation",
"Alcoholic beverages, tobacco products and recreational cannabis":"Alcohol",
"All-items":"All-items",
"Food":"Food",
"Shelter":"Shelter",
"Clothing and footwear":"Clothing and footwear",
"Transportation":"Transportation",
"Health and personal care":"Health and personal care",
"Energy":"Energy"})
#Calculate mean of each group
cpiv = cpiv.groupby(['Products_and_product_groups','Year'],as_index=False).mean()
#order for the plot
orderset = cpiv.groupby(['Products_and_product_groups'],as_index=False).mean()
orderset = orderset.sort_values('CPI')
oilcor = oil_price
gdp_oil = pd.merge(gdp_m, oilcor, how='inner', on='Year')
gdpoildrop = ['Date','Day','Month']
gdp_oil = gdp_oil.drop(gdpoildrop,axis =1)
gdp_oil = gdp_oil.groupby('Year',as_index=False).mean()
gdp_oil
| Year | GDP | Oil_price | |
|---|---|---|---|
| 0 | 2012 | 1.828370e+12 | 87.942140 |
| 1 | 2013 | 1.846600e+12 | 98.068722 |
| 2 | 2014 | 1.805750e+12 | 92.849937 |
| 3 | 2015 | 1.556510e+12 | 48.834810 |
| 4 | 2016 | 1.527990e+12 | 43.559508 |
| 5 | 2017 | 1.649270e+12 | 50.886789 |
| 6 | 2018 | 1.725330e+12 | 64.881921 |
| 7 | 2019 | 1.742020e+12 | 57.054024 |
| 8 | 2020 | 1.645420e+12 | 39.683857 |
| 9 | 2021 | 1.990760e+12 | 68.169953 |
sns.lmplot(data=gdpop, x="GDP", y="Oil_price").set(title="Correlation GDP vs Oil price")
print(gdpop_pearson)
<seaborn.axisgrid.FacetGrid at 0x7f0029e0b1f0>
PearsonRResult(statistic=0.7157074215797746, pvalue=0.019936932189352493)
#setting parameters for the chart
sns.set_style("dark")
rcParams['figure.figsize'] = 10,6
#visualization of first line in GDP
gdp_oil_p = sns.lineplot(data=gdp_oil.GDP, color="b",label="GDP").set(title='GDP and Oil Price')
#second axis of visualization
ax2 = plt.twinx()
gdp_oil_p = sns.lineplot(data=gdp_oil.Oil_price, color="r", ax=ax2,label="Oil Prices")
gdp_oil_p.get_yaxis().set_major_formatter(mpl.ticker.FuncFormatter(lambda x, p: format(int(x), ',')))
plt.legend(bbox_to_anchor=(0.95, 1), ncol=1)
gdp_oil_p.set_xticks(range(len(gdp_oil)), labels=range(2012, 2022))
<matplotlib.legend.Legend at 0x7f0029d5a800>
[<matplotlib.axis.XTick at 0x7f0029b891e0>, <matplotlib.axis.XTick at 0x7f0029b89600>, <matplotlib.axis.XTick at 0x7f0029ba7f40>, <matplotlib.axis.XTick at 0x7f0029bcd2a0>, <matplotlib.axis.XTick at 0x7f0029bcdb40>, <matplotlib.axis.XTick at 0x7f0029bce2c0>, <matplotlib.axis.XTick at 0x7f0029bcea10>, <matplotlib.axis.XTick at 0x7f0029ba7d00>, <matplotlib.axis.XTick at 0x7f0029e0a290>, <matplotlib.axis.XTick at 0x7f00701ef910>]
The purpose of the visualisations above is to introduce the next guiding question and begin the transition from the global scale to the local size where we will discuss Alberta. As we saw from both charts above the same pattern can be observed, so the next step was to get a scatter plot of GDP and Oil prices to see if any correlation exists, while it can be seen in the chart, the statistic also confirms it as it has a strong correlation of 0.712
Key takeaways:
Guiding Question 2: Which Albertaās sector was the most impacted during COVID?
Libraries used for visualizations:
Data set used:
Data Cleaning:
#plot data
rcParams['figure.figsize'] = 12,6
cpivb=sns.boxplot(data=cpiv,x="Products_and_product_groups", y="CPI",order=orderset['Products_and_product_groups'])
cpivb.tick_params(axis='x', rotation=90)
plt.title('Box plot of Products and product groups')
plt.show()
Text(0.5, 1.0, 'Box plot of Products and product groups')
The graph below shows the CPI for each industry in the post-pandemic era as box-plots. The CPI is primarily used to gauge the level of inflation that the general public is experiencing. The graph shows that the CPI for "energy" has increased the greatest. This indicates that the customer is the one who is most affected by inflation, particularly when it comes to paying for their energy bills.
The shelter industry has been significantly affected, ranking as the second hardest-hit sector where service costs have increased. People receiving shelter include those who pay rent to their landlords for housing. Additionally, it covers travelers who remain in transient "shelters" like hotels, hostels, etc.
In the post-pandemic era, we have also noticed a considerable increase in the cost of food and alcohol. Therefore, it stands to reason that Albertans would suddenly find it difficult to cover even their most basic expenses, such as energy bills, housing, and food.
# Oil Price of past 10 years.
df_oil_price = pd.read_csv("wti-crude-oil-prices-10-year-daily-chart.csv", skiprows=15)
# Parsing the Date column into Date-time format type.
#Citation: - https://stackoverflow.com/a/39206377/10912105
df_oil_price['date'] = pd.to_datetime(df_oil_price['date'])
df_oil_price['date'] = pd.to_datetime(df_oil_price['date']).dt.year
# Ordering the DataFrame in descending order according to the "Date" column.
df_oil_price.sort_values(by = 'date', ascending = False, inplace = True)
df_oil_price.reset_index(drop=True, inplace=True)
# Renaming column names.
df_oil_price.rename(columns = {'date':'year', ' value':'oil price ($/bbl)'}, inplace = True)
# Removing all the rows which has 0 or NaN / NA Oil price.
df_oil_price['oil price ($/bbl)'] = df_oil_price['oil price ($/bbl)'].fillna(0)
df_oil_price = df_oil_price[df_oil_price['oil price ($/bbl)'] != 0]
# Citation: - https://stackoverflow.com/a/46827856/10912105
# Taking the mean of the oil price per barrel according to each year.
aggregation_functions = {'oil price ($/bbl)': 'mean'}
df_oil_price = df_oil_price.groupby(['year'], as_index=False).aggregate(aggregation_functions).reindex(columns=df_oil_price.columns)
# Ordering the DataFrame in descending order according to the "year" column.
df_oil_price.sort_values(by = 'year', ascending = False, inplace = True)
df_oil_price.reset_index(drop=True, inplace=True)
# Dropping all the values in the DataFrame where year < 2016.
df_oil_price.drop(df_oil_price[(df_oil_price['year'] < 2016)].index, inplace=True)
#print data frame: oil price royalties
df_oil_price.head()
| year | oil price ($/bbl) | |
|---|---|---|
| 0 | 2022 | 98.010197 |
| 1 | 2021 | 68.169953 |
| 2 | 2020 | 39.683857 |
| 3 | 2019 | 57.054024 |
| 4 | 2018 | 64.881921 |
In the next section,The following Data Cleaning steps were taken: -
Data Sourcing and Data Cleansing consists of around 80% of the code. The rest 20% of the code is dedicated to plotting the graph.
For plotting the graph, Python's Plotly was used.
A line graph is made above depicting the changes in gross revenue and net revenue of Oil and Gas Industries with respect to time. The amount of Royalty paid by the Oil and Gas industry to the Albertan government is also shown.
The Line graph is divided into 3 time periods: -
# Data Scourcing of all Excel Files.
# Oil Royalties data from 2016 - 2021.
df_oil_royalties_2021 = pd.DataFrame(pd.read_excel("energy-2021-project-data-for-open-data-as-20220509-12am.xlsx"))
df_oil_royalties_2020 = pd.DataFrame(pd.read_excel("energy-2020-project-data-for-open-data-as-20220509-12am.xlsx"))
df_oil_royalties_2019 = pd.DataFrame(pd.read_excel("energy-2019-project-data-for-open-data-as-20220509-12am.xlsx"))
df_oil_royalties_2018 = pd.DataFrame(pd.read_excel("energy-2018-project-data-for-open-data-as-20220509-12am.xlsx"))
df_oil_royalties_2017 = pd.DataFrame(pd.read_excel("energy-2017-project-data-for-open-data-as-20220509-12am.xlsx"))
df_oil_royalties_2016 = pd.DataFrame(pd.read_excel("energy-2016-project-data-for-open-data-as-20220509-12am.xlsx"))
#Combining all the Excel files into 1 DataFrame.
df_oil_royalties = pd.concat([df_oil_royalties_2021, df_oil_royalties_2020, df_oil_royalties_2019, df_oil_royalties_2018, df_oil_royalties_2017, df_oil_royalties_2016])
# Data Cleaning of all Excel Files.
df_oil_royalties = df_oil_royalties.filter(['Operator Name', 'Reporting Year', 'Gross Revenue ($)', 'Gross Revenue ($/bbl)', 'Net Revenue ($)', 'Royalty Payable ($)'])
# Converting values from Scientific notation to Regular Integer notation.
df_oil_royalties[['Gross Revenue ($)', 'Net Revenue ($)', 'Royalty Payable ($)']] = df_oil_royalties[['Gross Revenue ($)', 'Net Revenue ($)', 'Royalty Payable ($)']].astype('int64')
# Removing all the rows which has 0 or NaN / NA Gross Revenue.
df_oil_royalties['Gross Revenue ($)'] = df_oil_royalties['Gross Revenue ($)'].fillna(0)
df_oil_royalties = df_oil_royalties[df_oil_royalties['Gross Revenue ($)'] != 0]
# Citation: - https://stackoverflow.com/a/46827856/10912105
# Taking the sum of the Gross Revenue, Net Revenue, and Royalty Payable. Taking the mean of Gross Revenue ($/bbl). Grouping by Year and Operator Name.
aggregation_functions = {'Gross Revenue ($)': 'sum', 'Gross Revenue ($/bbl)': 'mean', 'Net Revenue ($)': 'sum', 'Royalty Payable ($)' : 'sum'}
df_oil_royalties = df_oil_royalties.groupby(['Operator Name','Reporting Year'], as_index=False).aggregate(aggregation_functions).reindex(columns=df_oil_royalties.columns)
# Ordering the DataFrame in descending order according to column name "Reporting Year".
df_oil_royalties.sort_values(by = 'Reporting Year', ascending = False, inplace = True)
df_oil_royalties.reset_index(drop=True, inplace=True)
# Taking subset of the main DataFrame df_oil_royalties in order to show Gross Revenue separately.
df_gross_year = df_oil_royalties[['Reporting Year', 'Gross Revenue ($)']]
# Summing up the Gross Revenue year wise.
aggregation_functions = {'Gross Revenue ($)': 'sum'}
df_gross_year = df_gross_year.groupby(['Reporting Year'], as_index=False).aggregate(aggregation_functions).reindex(columns = df_gross_year.columns)
# Taking subset of the main DataFrame df_oil_royalties in order to show Gross Revenue per Barrel of Oil.
df_gross_per_bbp_year = df_oil_royalties[['Reporting Year', 'Gross Revenue ($/bbl)']]
# Taking the mean of the Gross Revenue per Barrel, year wise.
aggregation_functions = {'Gross Revenue ($/bbl)': 'mean'}
df_gross_per_bbp_year = df_gross_per_bbp_year.groupby(['Reporting Year'], as_index=False).aggregate(aggregation_functions).reindex(columns = df_gross_per_bbp_year.columns)
# Taking subset of the main DataFrame df_oil_royalties in order to show Net Revenue separately.
df_net_year = df_oil_royalties[['Reporting Year', 'Net Revenue ($)']]
# Summing up the Net Revenue year wise.
aggregation_functions = {'Net Revenue ($)': 'sum'}
df_net_year = df_net_year.groupby(['Reporting Year'], as_index=False).aggregate(aggregation_functions).reindex(columns = df_net_year.columns)
# Taking subset of the main DataFrame df_oil_royalties in order to show the royalty paid to the government of Alberta.
df_royalty_year = df_oil_royalties[['Reporting Year', 'Royalty Payable ($)']]
# Summing up the Royalty to the government paid year wise.
aggregation_functions = {'Royalty Payable ($)': 'sum'}
df_royalty_year = df_royalty_year.groupby(['Reporting Year'], as_index=False).aggregate(aggregation_functions).reindex(columns = df_royalty_year.columns)
#print data frame: oil royalties by year
df_royalty_year.head()
| Reporting Year | Royalty Payable ($) | |
|---|---|---|
| 0 | 2016 | 843040394 |
| 1 | 2017 | 2480251532 |
| 2 | 2018 | 2738473691 |
| 3 | 2019 | 4836819255 |
| 4 | 2020 | 1067122211 |
# Showing Gross Revenue, Net Revenue and Royalty payable together.
x = df_gross_year['Reporting Year']
y1 = df_gross_year['Gross Revenue ($)']
y2 = df_net_year['Net Revenue ($)']
y3 = df_royalty_year['Royalty Payable ($)']
# Create traces
oil_recovery = go.Figure()
oil_recovery= oil_recovery.add_trace(go.Scatter(x = x, y = y1,
mode='lines+markers',
name='Gross Revenue'))
oil_recovery=oil_recovery.add_trace(go.Scatter(x = x, y = y2,
mode='lines+markers',
name='Net Revenue'))
oil_recovery=oil_recovery.add_trace(go.Scatter(x = x, y = y3,
mode='lines+markers', name='Royalty'))
oil_recovery=oil_recovery.add_vrect(x0=2019, x1=2020,
annotation_text="COVID-19 Pandemic", annotation_position="top",
fillcolor="green", opacity=0.25, line_width=1)
oil_recovery=oil_recovery.update_layout(
title = "Gross Revenue, Net Revenue and Royalty with respect to Time",
xaxis_title = "Year",
yaxis_title = "($)",
legend_title="Legend",
font=dict(
family="Courier New, monospace",
size=12,
color="Black"
)
)
oil_recovery.show()
The above Oil Recovery graph was made with the data which was sourced/extracted from of 1 csv and 6 Excel files. The irrelevant data was deleted from the DataFrame and only the useful columns were kept.
The same can also be observed for Net revenue earned by Oil and Gas, and the Royalty which was paid to the Alberta's government for the following time period.
Key takeaways:
Guiding Question 3: How has inflation impacted different aspects of peopleās life?
Libraries used for visualizations:
Data set used:
Data Cleaning:
import re
def isDate(stringDate):
try:
parse(stringDate, fuzzy=False)
return True
except ValueError:
return False
def isfloat(a_string):
try:
float(a_string)
return True
except ValueError:
return False
def getLROI(strInp):
new_string = (strInp.lstrip()).rstrip()
if(isfloat(new_string)):
return round(pd.to_numeric(new_string),2)
else:
new_result = re.findall('[0-9]+', new_string)
if (len(new_result)>2):
numbers=pd.to_numeric(new_result)
return round((numbers[0]-(numbers[1]/numbers[2])),2)
plt.style.use('bmh')
from dateutil.parser import parse
The data for lending interest rates for local authorities is parsed in the stages below, and special characters and NA are handled. For the purpose of displaying the change in interest rates, the data is grouped and averaged over years. The aggregated comparisons are visualised using Plotly express.
#DataSet for lending interest rates for loan to local authorities
##read lending rates from csv file , rename columns and handle special characters
lendingRates_rawData = pd.read_csv(r"tbf-ltla-historical-lending-rates.csv",
skiprows=8,na_filter=True,skip_blank_lines=True,skipinitialspace=True,on_bad_lines='skip',skipfooter=10,
false_values=['Date','3 years','5 years','10 years','15 years','20 years','25 years','30 years'],engine='python')
lendingRates_rawData.rename(columns = {'Unnamed: 0':'date', 'Unnamed: 1':'3year', 'Unnamed: 2':'5year', '2021':'10year', 'Unnamed: 4':'15year',
'Unnamed: 5':'20year', 'Unnamed: 6':'25year', 'Unnamed: 7':'30year'}, inplace = True)
lendingRates_rawData=lendingRates_rawData.replace(
to_replace='%',
value='',
regex=True)
##lendingRates_rawData - drop data with invalid dates, check for valid date and convert to date type, filtering based on date column
lendingRates_rawData.drop(lendingRates_rawData[(lendingRates_rawData.date.isnull()) | (lendingRates_rawData.date=='Date ')].index,inplace=True)
lendingRates_rawData['isDate'] = lendingRates_rawData.apply(lambda row : isDate(row['date']), axis = 1)
lendingRates_rawData.drop(lendingRates_rawData[ (lendingRates_rawData.isDate==False)].index,inplace=True)
lendingRates_rawData['date']=(pd.to_datetime(lendingRates_rawData['date'])).dt.to_period('M')
lendingRates_rawData.drop(lendingRates_rawData[lendingRates_rawData['date']<'2010-01'].index,inplace=True)
##lendingRates - Refined interested columns, handle na for filter Data from lendingRates_rawData
lendingRates=lendingRates_rawData.filter(["date", "25year"])
lendingRates[['25year']]=lendingRates[['25year']].astype("string")
lendingRates=lendingRates.replace(
to_replace="N/A ",
value='0',
regex=True)
##lendingRates - get valid interest dates
lendingRates['25year']=lendingRates_rawData['25year'].apply(lambda row : getLROI(row))
dfAvgMonthlyLendingRate=(lendingRates.sort_values('date').groupby(by=['date'],as_index=False)['25year'].mean())
lendingRates['year']=lendingRates['date'].dt.year
#print data frame: Lending Rates
lendingRates.head()
| date | 25year | year | |
|---|---|---|---|
| 0 | 2021-01 | 2.22 | 2021 |
| 1 | 2021-01 | 2.23 | 2021 |
| 2 | 2021-02 | 2.30 | 2021 |
| 3 | 2021-02 | 2.52 | 2021 |
| 4 | 2021-03 | 2.65 | 2021 |
##lendingRates - visualise using violin plot
lending_plot = px.violin(data_frame=lendingRates,x=lendingRates['year'], y=lendingRates['25year'],
box=True,
title='25 yr lending rate to local authorities',log_x=True
,labels={"x":"Year","y":"Lending rates to local authority"}
#,color="year"
)
lending_plot.show()
The loan interest rates for the capital provided to local authorities for administering the province are shown in the above graph. The graph displays the relationship between interest rates and GDP expansion. Lower interest rates are readily seen for the sluggish GDP growth years of 2015ā2016 and 2019ā2020. When the economy is recovering, interest rates rise as the gdp growth improves. The purpose of the higher interest rates is to stop the pressure on prices from getting any worse. Conversely, as growth slows, interest rates are reduced.
#DataSet for prime interest rates
##primeRates; prime interest rates from csv file , rename columns and convert to appropriate data types, handle na
primeRates = pd.read_csv(r"Prime-Rate-History-wowa.csv",
na_filter=True,skip_blank_lines=True,skipinitialspace=True,on_bad_lines='skip')
primeRates.rename(columns = {'Date':'date', 'Prime Rate':'primeRatePer', 'Bank of Canada Overnight Rate':'BOIOvernightChngPer'}, inplace = True)
avgPrimeRate2022=primeRates[pd.to_datetime(primeRates['date'])=='2022-10-01']['primeRatePer'].mean()
df2 = pd.DataFrame({'date': ['2022-11-01','2022-12-01'],
'primeRatePer': [avgPrimeRate2022,avgPrimeRate2022],
'BOIOvernightChngPer': [3.25, 3.25]})
primeRates = primeRates.append(df2, ignore_index = True)
primeRates['date']=pd.to_datetime(primeRates['date']).dt.to_period('M')
primeRates.drop(primeRates[primeRates['date']<'2010-01'].index,inplace=True)
primeRates=primeRates.sort_values('date')
primeRates['year']=primeRates['date'].dt.year
primeRates['month']=primeRates['date'].dt.month
primeRates.dropna()
/tmp/ipykernel_618/3923255997.py:13: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
| date | primeRatePer | BOIOvernightChngPer | year | month | |
|---|---|---|---|---|---|
| 626 | 2010-01 | 2.25 | 0.25 | 2010 | 1 |
| 627 | 2010-02 | 2.25 | 0.25 | 2010 | 2 |
| 628 | 2010-03 | 2.25 | 0.25 | 2010 | 3 |
| 629 | 2010-04 | 2.25 | 0.25 | 2010 | 4 |
| 630 | 2010-05 | 2.25 | 0.50 | 2010 | 5 |
| ... | ... | ... | ... | ... | ... |
| 777 | 2022-08 | 4.70 | 2.50 | 2022 | 8 |
| 778 | 2022-09 | 5.45 | 3.25 | 2022 | 9 |
| 779 | 2022-10 | 5.45 | 3.25 | 2022 | 10 |
| 780 | 2022-11 | 5.45 | 3.25 | 2022 | 11 |
| 781 | 2022-12 | 5.45 | 3.25 | 2022 | 12 |
156 rows Ć 5 columns
#print data frame: prime rates
primeRates.head()
| date | primeRatePer | BOIOvernightChngPer | year | month | |
|---|---|---|---|---|---|
| 626 | 2010-01 | 2.25 | 0.25 | 2010 | 1 |
| 627 | 2010-02 | 2.25 | 0.25 | 2010 | 2 |
| 628 | 2010-03 | 2.25 | 0.25 | 2010 | 3 |
| 629 | 2010-04 | 2.25 | 0.25 | 2010 | 4 |
| 630 | 2010-05 | 2.25 | 0.50 | 2010 | 5 |
#pivot prime rate data frame for visualising in heat map
rates = primeRates.pivot("month", "year", "primeRatePer")
grid_kws = {"height_ratios": (.9, .05), "hspace": .3}
cmap = sns.cm.rocket_r
f, (ax, cbar_ax) = plt.subplots(2, gridspec_kw=grid_kws,figsize=(15, 10))
ax = sns.heatmap(rates, ax=ax,
cbar_ax=cbar_ax,
cbar_kws={"orientation": "horizontal"},annot=True,cmap=cmap)
ax.set_title('Heatmap of Interest rates')
Text(0.5, 1.0, 'Heatmap of Interest rates')
The prime interest rates, are the rates at which people get loan are shown in the following graph. From the above graph we can clearly see the prime interest rates were quite stable between 2011- 2014, it went down to 2.7% during recession during 2015-2016. A similar trend is observed during pandemic, after a stable rates between 2017-2019, the rate of interest were reduced during pandic and then soared. The increased rates directly impacts the variable rate mortages and line of credits. With the soar in interest rates to 5.5%, the mortage rates has increased , thus affecting the housing market as sales have slipped. It also affects the spending power of people having mortages.
Unemployment The rate between young adults and those over 25 can be found by reading the csv file below. Data set columns are renamed and changed into the proper data types as part of the data cleaning process. Then NAĀ is handled, and data is filtered for Seaborn visualization.
#DataSet for un employment data
##UnEmployment Rate among between young adult and people over 25; prime interest rates from csv file , rename columns and convert to appropriate data types, handle na
unEmpRates = pd.read_csv(r"UnEmploymentData.csv",
na_filter=True,skip_blank_lines=True,skipinitialspace=True,on_bad_lines='skip')
unEmpRates[['Characteristic', 'NAICS',
'AgeGroup', 'Sex']]=unEmpRates[['Characteristic', 'NAICS',
'AgeGroup', 'Sex']].astype("string")
unEmpRates['When']=pd.to_datetime(unEmpRates['When']).dt.to_period('M')
unEmpRates.drop(unEmpRates[unEmpRates['When']<'2010-01'].index,inplace=True)
unEmpRates['Sex']=unEmpRates['Sex'].apply(lambda row : (row.lstrip()).rstrip())
unEmpRates['Year']=unEmpRates['When'].dt.year
filtered_unEmpRates = unEmpRates.loc[(unEmpRates['Sex'] == 'Both sexes' )&(unEmpRates['NAICS']=='All Industries')][['When','Year','NAICS','AgeGroup','Alberta']]
filtered_unEmpRates.dropna()
| When | Year | NAICS | AgeGroup | Alberta | |
|---|---|---|---|---|---|
| 29451 | 2010-01 | 2010 | All Industries | 15 to 24 years | 11.1 |
| 29452 | 2010-01 | 2010 | All Industries | 15 years and over | 6.7 |
| 29457 | 2010-02 | 2010 | All Industries | 15 to 24 years | 10.8 |
| 29458 | 2010-02 | 2010 | All Industries | 15 years and over | 6.8 |
| 29463 | 2010-03 | 2010 | All Industries | 15 to 24 years | 12.8 |
| ... | ... | ... | ... | ... | ... |
| 30352 | 2022-07 | 2022 | All Industries | 15 years and over | 4.8 |
| 30357 | 2022-08 | 2022 | All Industries | 15 to 24 years | 10.6 |
| 30358 | 2022-08 | 2022 | All Industries | 15 years and over | 5.4 |
| 30363 | 2022-09 | 2022 | All Industries | 15 to 24 years | 9.9 |
| 30364 | 2022-09 | 2022 | All Industries | 15 years and over | 5.5 |
306 rows Ć 5 columns
#print data frame: unemployment rates
filtered_unEmpRates.head()
| When | Year | NAICS | AgeGroup | Alberta | |
|---|---|---|---|---|---|
| 29451 | 2010-01 | 2010 | All Industries | 15 to 24 years | 11.1 |
| 29452 | 2010-01 | 2010 | All Industries | 15 years and over | 6.7 |
| 29457 | 2010-02 | 2010 | All Industries | 15 to 24 years | 10.8 |
| 29458 | 2010-02 | 2010 | All Industries | 15 years and over | 6.8 |
| 29463 | 2010-03 | 2010 | All Industries | 15 to 24 years | 12.8 |
ueplot=(20, 10)
ueplot = px.violin(filtered_unEmpRates, y="Alberta", x="Year", color="AgeGroup",
box=True, points="all",log_x=True,width=1000
,title='UnEmployment Rate in Alberta over the years among people'
,labels={"Alberta":"Unemployment Rate in Alberta"}
)
ueplot.show()
The graph above displays the trend in Alberta's unemployment rate for young adults aged 15 to 24 and for the general population. It was pretty clear that the unemployment rate for both categories had skyrocketed in 2019ā2020. Since 2021, the unemployment rate has dropped for everyone following the epidemic phase. This can be explained by the increasing GDP growth rate, which leads to the creation of jobs and a decrease in unemployment.
In the next steps,Average weekly earning are read from the csv files. As part of data cleaning date is parsed, Industries are grouped and assigned categories. The data is further grouped and filtered for visualisation using plotly.
##avgEarnings data: read csv , skip na, change categorical values , handlespecial characters
avgEarnings=pd.read_csv(r"./average_weekly_earnings_monthly.csv",skiprows=0)
avgEarnings[['Geography','Estimate','Industry','Average_Weekly_Earnings']]=avgEarnings[['Geography','Estimate','Industry','Average_Weekly_Earnings']].astype("string")
avgEarnings['Reference_Date']=(pd.to_datetime(avgEarnings['Reference_Date'])).dt.to_period('M')
avgEarnings['Average_Weekly_Earnings']=avgEarnings['Average_Weekly_Earnings'].str.replace("$","",regex='True')
avgEarnings['Average_Weekly_Earnings']=pd.to_numeric(avgEarnings['Average_Weekly_Earnings'])
avgEarnings['year']=avgEarnings['Reference_Date'].dt.year
avrEarningsAlberta=avgEarnings.loc[(avgEarnings['Geography']=='Alberta') & (avgEarnings['year']>=2010)][['Reference_Date','year','Geography','Industry','Average_Weekly_Earnings']]
avrEarningsAlberta['Industry']=avrEarningsAlberta['Industry'].apply(lambda row : row.replace("Accommodation and food services [72]","Hospitality"))
avrEarningsAlberta['Industry']=avrEarningsAlberta['Industry'].apply(lambda row : row.replace('Arts, entertainment and recreation [71]',"entertainment"))
avrEarningsAlberta['Industry']=avrEarningsAlberta['Industry'].apply(lambda row : row.replace('Construction [23]',"Construction"))
avrEarningsAlberta['Industry']=avrEarningsAlberta['Industry'].apply(lambda row : row.replace('Educational services [61]',"Education Service"))
avrEarningsAlberta['Industry']=avrEarningsAlberta['Industry'].apply(lambda row : row.replace('Health care and social assistance [62]',"Health and Service"))
avrEarningsAlberta['Industry']=avrEarningsAlberta['Industry'].apply(lambda row : row.replace('Finance and insurance [52]',"Finance"))
avrEarningsAlberta['Industry']=avrEarningsAlberta['Industry'].apply(lambda row : row.replace('Mining, quarrying, and oil and gas extraction [21]',"Oil and NaturalGas"))
avrEarningsAlberta['Industry']=avrEarningsAlberta['Industry'].apply(lambda row : row.replace('Transportation and warehousing [48-49]',"Transport"))
avrEarningsAlberta['Industry']=avrEarningsAlberta['Industry'].apply(lambda row : row.replace('Trade [41-45N]',"Trade"))
avrEarningsAlberta['Industry']=avrEarningsAlberta['Industry'].apply(lambda row : row.replace('Retail trade [44-45]',"Trade"))
avrEarningsAlberta['Industry']=avrEarningsAlberta['Industry'].apply(lambda row : row.replace('Wholesale trade [41]',"Trade"))
avrEarningsAlberta['Industry']=avrEarningsAlberta['Industry'].apply(lambda row : row.replace('Utilities [22]',"Misc"))
avrEarningsAlberta['Industry']=avrEarningsAlberta['Industry'].apply(lambda row : row.replace('Public administration [91]',"Administration"))
avrEarningsAlberta['Industry']=avrEarningsAlberta['Industry'].apply(lambda row : row.replace('Service producing industries [41-91N]',"Industrial Sector"))
avrEarningsAlberta['Industry']=avrEarningsAlberta['Industry'].apply(lambda row : row.replace('Industrial aggregate excluding unclassified businesses [11-91N]',"Industrial Sector"))
avrEarningsAlberta['Industry']=avrEarningsAlberta['Industry'].apply(lambda row : row.replace('Management of companies and enterprises [55]',"Industrial Sector"))
avrEarningsAlberta['Industry']=avrEarningsAlberta['Industry'].apply(lambda row : row.replace('Goods producing industries [11-33N]',"Industrial Sector"))
avrEarningsAlberta['Industry']=avrEarningsAlberta['Industry'].apply(lambda row : row.replace('Information and cultural industries [51]',"Industrial Sector"))
avrEarningsAlberta['Industry']=avrEarningsAlberta['Industry'].apply(lambda row : row.replace('Real estate and rental and leasing [53]','Rentals'))
avrEarningsAlberta['Industry']=avrEarningsAlberta['Industry'].apply(lambda row : row.replace('Administrative and support, waste management and remediation services [56]','Administration'))
avrEarningsAlberta['Industry']=avrEarningsAlberta['Industry'].apply(lambda row : row.replace('Durable goods [321N]' ,'goods'))
avrEarningsAlberta['Industry']=avrEarningsAlberta['Industry'].apply(lambda row : row.replace( 'Forestry, logging and support [11N]' ,'Forestry'))
avrEarningsAlberta['Industry']=avrEarningsAlberta['Industry'].apply(lambda row : row.replace('Other services (except public administration) [81]','Misc'))
avrEarningsAlberta['Industry']=avrEarningsAlberta['Industry'].apply(lambda row : row.replace('Professional, scientific and technical services [54]','Misc'))
avrEarningsAlberta['Industry']=avrEarningsAlberta['Industry'].apply(lambda row : row.replace('Non-durable goods [311N]','Misc'))
avrEarningsAlberta['Industry']=avrEarningsAlberta['Industry'].apply(lambda row : row.replace('Manufacturing [31-33]',"goods"))
##avrEarningsAlbertaGrpd data: filter , aggreagte data over industries , year for alberta
avrEarningsAlbertaGrpd=avrEarningsAlberta.sort_values('year').groupby(by=["Geography","Industry","year"])["Average_Weekly_Earnings"].mean()
avrEarningsAlbertaGrpd=avrEarningsAlbertaGrpd.reset_index()
avgEarnings_data = avgEarnings.loc[ (avgEarnings['Geography']!="Canada")].sort_values('year').groupby(by=["Geography","Industry","year"])["Average_Weekly_Earnings"].mean()
avrEarningsAlberta["Industry"].unique()
avgEarnings_data = avgEarnings_data.reset_index()
array(['Hospitality', 'Administration', 'entertainment', 'Construction',
'goods', 'Education Service', 'Finance', 'Forestry',
'Industrial Sector', 'Health and Service', 'Oil and NaturalGas',
'Misc', 'Rentals', 'Trade', 'Transport'], dtype=object)
#print data frame:
avgEarnings_data.head()
| Geography | Industry | year | Average_Weekly_Earnings | |
|---|---|---|---|---|
| 0 | Alberta | Accommodation and food services [72] | 2001 | 262.765833 |
| 1 | Alberta | Accommodation and food services [72] | 2002 | 272.731667 |
| 2 | Alberta | Accommodation and food services [72] | 2003 | 271.893333 |
| 3 | Alberta | Accommodation and food services [72] | 2004 | 287.201667 |
| 4 | Alberta | Accommodation and food services [72] | 2005 | 297.199167 |
##Visualise average earings over the years to understand the trend
industry_chart = px.line(avrEarningsAlbertaGrpd, x = "year", y = "Average_Weekly_Earnings",
color = "Industry",
color_discrete_map={
'Oil and NaturalGas': "red",
'Forestry': "green",
'Finance': "blue",
"Oceania": "goldenrod",
'Forestry': "magenta",
'Hospitality':"darkviolet",
'Administration':"indianred",
'entertainment':"mediumaquamarine",
'Construction':"olivedrab",
'goods':"steelblue",
'Education Service':"darkslategray",
'Industrial Sector':"deeppink",
'Health and Service':"black" ,
'Misc':"chocolate",
'Rentals':"darkviolet",
'Trade':"dodgerblue",
'Transport':"green"}
,width=1000
,line_dash="Industry"
,labels={"Average_Weekly_Earnings":"Average Earnings grouped over years"}
,title= "Average earning categorised by industry over years in ALberta"
)
industry_chart.show()
The graph up top displays the typical income earned by Albertans in various industries. We can deduce from the graph that Albertans' earnings have gradually increased since the outbreak.
Key takeaways
Guiding Question 4: General situation
Libraries used for visualizations:
Data set used:
Data Cleaning:
rdata_gdppercapita = pd.read_csv("./canada-gdp-per-capita.csv",skiprows=16,parse_dates=['date'])
#manipulation of date
yseriesg = rdata_gdppercapita['date'].map(lambda x: x.year)
rdata_gdppercapita.insert(1,'Year',yseriesg)
#manipulating data frame
gdppc_todrop = [" ", "date"]
gdppc = rdata_gdppercapita.drop(gdppc_todrop,axis=1)
gdppc.rename(columns={' GDP Per Capita (US $)':'GDP_per_capita',' Annual Growth Rate (%)':'Annual_Growth_Rate'},inplace=True)
#plot
gdppc["Color"] = np.where(gdppc["Annual_Growth_Rate"]<0, 'red', 'green')
gdppcp = make_subplots(specs=[[{"secondary_y": True}]])
gdppcp=gdppcp.add_trace(go.Bar(x=gdppc['Year'],y=gdppc['Annual_Growth_Rate'],
marker_color=gdppc['Color'],name='Growth Rate'),
secondary_y=False)
gdppcp=gdppcp.add_trace(go.Scatter(x=gdppc['Year'],y=gdppc['GDP_per_capita'],
name="GDP Per Capita"),secondary_y=True,)
gdppcp=gdppcp.update_yaxes(range=[-40000, 55000],secondary_y=True)
gdppcp=gdppcp.update_yaxes(title="GDP per capita ($US)",secondary_y=True)
gdppcp=gdppcp.update_yaxes(title="Growth Rate",secondary_y=False)
gdppcp=gdppcp.update_layout(dragmode="zoom",hovermode="x",legend=dict(traceorder="reversed"),height=600,
template="simple_white",margin=dict(t=100,b=100),)
gdppcp=gdppcp.update_layout(title="GDP per capita and Growth Rate for Canada")
gdppcp.show()
Conclusion
Our aim of the project was to know how COVID-19 has impacted Canada (Bird's eye view), Alberta (The province which concerns us the most), and the economic impact of COVID-19 on Alberta's consumers and producers.
To measure the economic health of Canada, we look at the fluctuation in its GDP. The biggest and most important sector (export-wise) in Canada as a whole is Oil & Gas and Mining sector. Therefore, we also look at oil price fluctuation as well since high oil prices result in a growing economy for Canada and vice-versa.
We observed that during COVID-19, The GDP of Canada took a big hit due to low oil prices and reduced demand for goods that are mainly exported by Canada. We can also see that after 2020, (post-pandemic) there has been a huge rebound which is a positive indicator signaling a rapid recovery.
We also measured the correlation of Canada's GDP with Oil Prices to know how hugely oil prices affect the GDP of Canada. We found a positive correlation of 0.716! which indicates that the Canadian economy is very sensitive to global oil prices and high oil prices are good for the Canadian economy.
When it comes to Alberta, we wanted to know how the consumers in Alberta province and major producers in Alberta were affected due to COVID-19. The biggest industry in Alberta is Oil and Gas, therefore we take a look in great detail at how the Oil & Gas industry of Alberta was affected.
We found out that during pre-pandemic times, the gross revenue earned by Oil & Gas was increasing steadily at a slow pace; during the pandemic, we saw a huge dip in gross revenue (huge losses) due to less oil demand and post pandemic; we see that the industry has made a roaring comeback with record high profits, never seen before.
Till now, we have talked about how Canada and Alberta, in general, were affected, but now we will talk about how the common man and the general public of Alberta were affected due to the pandemic. To know that, we looked at CPI in the post-pandemic era and the unemployment rate. We found that the cost of living for most basic things like Energy, Housing, Food & Alcohol has risen significantly post-pandemic. This creates a lot of hardship for those Albertans who were living paycheck to paycheck before the pandemic struck. We also observed a huge spike in the unemployment rate during the COVID-19 Pandemic (35% of people being unemployed at one point in time!), This rate has come down in the year 2022, but still, the unemployment rate is a little bit higher when compared with the pre-pandemic era, which is concerning since it shows that new job postings have not been created fast enough for the rest of the people.
We also observed the response of the Canadian government and central banks during the pandemic. These institutions tried their best to stop a deep recession/depression during turbulent times. We observed that the banks started giving out loans/credit to other people during the pandemic at very cheap/low-interest rates to create demand and hold the economy steady. the interest rates have again gone up since the times are now better and the economy is in the recovery phase.
In the end, we show that during the pandemic, the GDP of Canada contracted by around 12.5 % and during post pandemic, the GDP grew by 20%, which is a good sign. All in all, we see that even though the economy has largely recovered, there is still some inertia in the system regarding unemployment rates and the general well-being of the Canadian people. We expect these conditions to improve as time furtherĀ movesĀ on.